Enhancing Dataset Quality using Keys
نویسندگان
چکیده
The Linked Data principles provide a decentral approach for publishing structured data in RDF on the Web. A consequence of this architectural choice is a high variance in the quality of the RDF datasets which constitute the Linked Data cloud. In this demo paper, we address a particular aspect of quality, i.e., the discriminability of resources. During our demo, we will present our simple three-step approach and interface, which allows data publishers to detect the resources in their dataset that are indistinguishable with respect to a given set of properties. Our approach is highly scalable as it relies on ROCKER, a novel algorithm for key discovery. Our evaluation on DBpedia suggests that even very commonly-used data sources are still in need to significant improvement to abide by the discriminability criterion.
منابع مشابه
Automatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network
Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...
متن کاملFuzzy Key Linkage Robust Data Mining Methods for Real Databases
Results of data mining depend heavily on the quality of linkage keys within a search dataset and within its database target. Linkage failures due to errors or variations in linkage keys have few symptoms, and can hide or distort what data have to tell us. More robust methods have promise as remedies, but require careful planning and understanding of specialized technologies. A tour of fuzzy lin...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملImage Segmentation using Improved Imperialist Competitive Algorithm and a Simple Post-processing
Image segmentation is a fundamental step in many of image processing applications. In most cases the image’s pixels are clustered only based on the pixels’ intensity or color information and neither spatial nor neighborhood information of pixels is used in the clustering process. Considering the importance of including spatial information of pixels which improves the quality of image segmentati...
متن کاملIris Recognition System Based on Texture Features
Nowadays iris recognition becomes one of the most common methods for identification like password, keys, etc. In this paper, a new iris recognition system based on texture has been proposed to recognize persons using low quality iris images. At first, the iris area is located, and then a new method for eyelash and eyelid detection is applied, the introduced method depends on making image statis...
متن کامل